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BAYESIAN ANALYSIS IN MOMENT INEQUALITY MODELS 

By Yuan Liao and Wenxin Jiang 

Northwestern University 

This paper presents a study of the large-sample behavior of the 
posterior distribution of a structural parameter which is partially 
identified by moment inequalities. The posterior density is derived 
based on the limited information likelihood. The posterior distribu- 
tion converges to zero exponentially fast on any (5-contraction outside 
the identified region. Inside, it is bounded below by a positive con- 
stant if the identified region is assumed to have a nonempty interior. 
Our simulation evidence indicates that the Bayesian approach has ad- 
vantages over frequentist methods, in the sense that, with a proper 
choice of the prior, the posterior provides more information about 
the true parameter inside the identified region. We also address the 
problem of moment and model selection. Our optimality criterion is 
the maximum posterior procedure and we show that, asymptotically, 
it selects the true moment/model combination with the most moment 
inequalities and the simplest model. 

1. Introduction. 

1.1. Formulation of the problems. Let {ri,A,P) denote a probability 
space. Suppose that we are interested in some structural parameter £ 
that satisfies a set of moment inequality conditions: 

(1.1) EmjiX,eo)>0, j = I, 

where mj{-,9),i = l,...,p, are known real- valued moment functions. X is 
an observable random vector defined on {Cl,A,P) and we assume that we 
observe i.i.d. or stationary realizations X"" = {Xi, . . . ,Xn} of X. A model 
that is characterized by moment inequalities (1.1) is usually called a moment 
inequality model. 

A key feature of moment inequality models is that 6q is not necessarily 
point identified: there exists more than one solution to the inequalities in 
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(1.1) if Emj{X, 6q) is viewed as a function of Oq. In other words, if we let G 
be the parameter space that contains a-nd define 

(1.2) n = {eee, Em^ix, e)>o,j = i,... ,p}, 

then can be a nonsingleton set. In this case, we say that is partially 
identified on and is called the identified region. 

Many partially identified models are characterized by such moment in- 
equalities, where the parameter of interest is only partially identified and 
therefore cannot possibly be consistently estimated. In this framework, since 
the identified region captures all of the information about the parameter, it 
becomes one of the most interesting subjects of study in moment inequality 
models [see, e.g., Chernozhukov, Hong and Tamer (2007), CHT, hereafter]. 

In addition to the problem of studying the identified region, there is also 
a moment/model selection problem in moment inequality models. Suppose 
that we have p candidate moment inequalities 

Emj{X,e)>0, j = l,...,p, 

with a /c-dimensional parameter vector 6 = {9i, . . . , 9^)'^ that belongs to the 
parameter space 0i x • • • x 0^. The moment selection problem refers to se- 
lecting the best subset of the moment inequalities among all of the possible 
candidates, while the model selection procedure addresses the problem of 
selecting the best model that is characterized by setting some components 
of the parameter to be zero. Such a candidate model can be a parameter 
subspace like {0} x 02 x • • • x 0^. Therefore, the moment/model selection 
procedure produces a combination of moment inequalities and a parameter 
subspace. For instance, in Example 1.3 regarding the instrumental variable 
regression with interval censoring model, the moment selection problem can 
correspond to selecting the instrumental variables (components of Z), while 
the model selection problem is related to selecting the useful explanatory 
variables (components of X) that have nonzero regression coefficients. Ulti- 
mately, the selected combination should achieve some sense of optimality. 

1.2. Some motivating examples. There are several interesting examples 
for the moment inequality models described above, where the parameter of 
interest is identified on a nonsingleton set. 

Example 1.1 [Interval censored data; see, e.g.. Example 1 of CHT (2007)]. 
Let y be a real- valued random variable which lies in [li,l2] almost surely; 
Yi and Y2 are observed random variables, but Y is not observed. (Sometimes 
one may assume that Y2 = Yi + 1, as in the case where Yi is the recorded 
integer part of Y.) The parameter of interest is = E{Y). We then have 
the following moment inequalities: 

E{Y2-eo)>0, E{eo-Yi)>0. 

Then 9q is partially identified on = [EYi,EY2]. 
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Example 1.2 [Missing data; see, e.g., Example 1 of Canay (2008)]. As- 
sume that (y, Z) G [0, 1] X {0, 1} and that Y is observed only when Z = 1. 
Suppose that we are interested in the parameter = EY , which corresponds 
to E[ZY + (1 - Z)Y - Oq] = 0, where ZY is known, but not (1 - Z)Y . Noting 
that < (1 — Z)Y <1 — Z,we have moment inequalities 



Then Oq is partially identified onQ. = [E{ZY),E{ZY) + 1 - EZ]. 

Example 1.3 [Interval regression model; see, e.g., Example 2 of CHT 
(2007)]. Consider y, Yi,l25 as in the setup of Example 1.1, assuming that 
the conditional mean of the unobserved Y is X'^Oq, where is the parameter 
of interest and X is a regressor vector. Due to Yi <Y <Y2, we then have 
moment inequalities 



where Z is a vector of positive functions of X or positive instrumental vari- 
ables. 

1.3. Literature review and contributions of this paper. Many frequentist 
inference procedures for the identified region as well as the true parameter 
have been developed in this growing area of interest. For example, Cher- 
nozhukov. Hong and Tamer (2007) construct an econometric criterion func- 
tion so that its set of minimizers form the identified region. They consider 
consistent estimation of the identified region and have shown that their set 
estimator is consistent in Hausdorff distance. Additionally, they derive the 
convergence rate of their estimator and construct the confidence set for the 
identified region. Moreover, Andrews and Soares (2007) develop confidence 
sets of the identified region and a test of the moment inequalities /equalities 
based on generalized moment selection. Among others, Rosen (2008) pro- 
vides a formulation of criterion functions that differ from CHT and derives 
analytical critical values of the confidence region. Beresteanu and Molinari 
(2008) recently proposed inference procedures when the identified region can 
be written as a transformation of the Aumann expectation based on random 
set theory. Some additional papers in the literature that consider inference 
with partially identified models include Pakes et al. (2006), Andrews and 
Jia (2008), Romano and Shaikh (2008), Bugni (2007), Horowitz and Manski 
(2000), Manski and Tamer (2002), Canay (2008) and Liu and Shao (2003). 

This paper studies a Bayesian approach to the moment inequality models. 
The Bayesian procedure provides distributional information for the partially 
identified parameter, both inside and outside the identified region, through 
its posterior distribution. The advantages of using posterior distributions 



E{eo - ZY) > 0, 



E{ZY -00 + 1- Z)>0. 



EZ{Y2 - X' Oo) > 0, 



EZiX'^Oo - Yi) > 
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to characterize the parameters are many. First, as pointed out by Poirier 
(1998), a Bayesian analysis of partial identification models is always possi- 
ble if a proper prior for the parameters is specified. If we have some a priori 
information on then, by using a properly chosen prior distribution, the 
resulting posterior density may not be flat within the identified region; this 
provides evidence that the parameter is more likely to lie in some particular 
area. Second, even with a flat prior distribution, when Oq is multidimen- 
sional, the posterior density of some components of may no longer be 
flat, due to the shape of the identifled region. Hence, if we are interested 
in these components of then the posterior density can still provide ex- 
tra information on their locations within the identifled region. As a third 
advantage, it can be shown asymptotically that the posterior density has 
support only on the identifled region. Containing more information, a pos- 
terior density can always be used to estimate the identifled region, but not 
vice versa. Finally, the MCMC method is a very powerful method to draw 
samples from the posterior, which can be used for approximations of the 
calculation of the posterior statistics. In addition, those posterior samples 
can also be used in frequentist methods to estimate the identifled region, by, 
for example, minimizing an econometric criterion function in CHT. 

In fact, Bayesian methods have been extensively applied to nonidenti- 
fled situations. Gelfand and Sahu (1999) have studied issues surrounding 
nonidentiflability and improper priors in the context of generalized linear 
models. Neath and Samaniego (1997) consider Bayesian updating for a non- 
identifled two-parameter binomial model. Gustafson (2005) studies Bayesian 
inference in nonidentifled scenarios involving misclassiflcation and measure- 
ment errors, which was discussed by a number of prominent researchers. 
Recently, Moon and Schorfheide [(2009), hereafter, MS] have considered the 
Bayesian approach to partially identifled models when the model can involve 
three types of parameters: the structural parameters of interest, a reduced- 
form parameter vector that is point-identifled by data and also a vector of 
auxiliary parameters which links the structural and reduced-form parame- 
ters via some known function. They also derive the Bayesian credible sets 
and compare them with frequentist confldence intervals for a number of par- 
ticular models. All of these papers use traditional posteriors based on the 
likelihood function instead of the moment inequalities. 

Our Bayesian approach proceeds within a more general framework. In con- 
trast to the previous work, we do not need to have a full probability model 
for the observed data. Starting from moment inequalities Em{X,9Q) > 0, 
where m{X,-) is a known function of 9q, we introduce a bias parameter 
Ao > so that Em{X, 6q) = Xq and place prior distributions on {Oq, Aq). The 
posterior density of can then be derived based on a limited information 
likelihood function, which is generated by the conditional asymptotic distri- 
bution of ^ Yl^=i ^{^i^(^o) ~ -^0 given (6*0, Aq), integrating out Aq. We study 
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in detail the frequentist behavior of the posterior density function of ^o- We 
derive the bounds of convergence rates of the posterior density, both inside 
and outside of the identified region. We show that there is a large "gap" 
between them. Once the posterior density and its frequentist properties are 
obtained, it is easy to derive consistent estimators for the identified region. 
However, we point out that a posterior density provides more information 
than a region estimation since it can also incorporate prior information and 
describe how likely the true parameter is to be distributed both inside and 
outside the identified region. 

In addition to studying the identified region, we also consider the problem 
of selecting moments and models in the context of (1.1), where only a subset 
of the moment inequalities are to be used and the true parameter vector is 
assumed to follow a submodel allowing only some selected components to be 
nonzero (which can be, e.g., the regression coefficients of some selected ex- 
planatory variables). Andrews and Soars (2007) employ a modified moment 
selection procedure to determine which moment inequalities are not binding, 
by minimizing an information- type criterion. The moment /model selection 
problem we consider here is different from theirs. In this paper, we have two 
goals in the selection procedure: first, selecting a true moment/model and 
second, among the true candidates, selecting the "optimal" one, in a sense 
which will be described in Section 4. Since the true parameter is not point- 
identified, it is impossible to test the moment inequalities evaluated at the 
true parameter. Hence, the moment inequalities in this paper are true in the 
sense that, fixing the dimension of the parameter vector and the parameter 
space, the identified region defined by these moment inequalities on the fixed 
parameter space is not empty. In addition, we observe that whether a set 
of moment inequalities is satisfied or not also depends on the parameters 
that are included in the model and hence is related to the parameter space. 
In some situations, a set of moment inequalities defines a valid (nonempty) 
identified region on one parameter space, but cannot if one or more of the 
parameters are excluded from the model. This is a result of the reduction of 
the dimension of the parameter space. By treating the set of moments and 
the set of nonzero parameters as a combination, the problems of moment 
selection and model selection are combined. In Section 4, we propose the 
maximum posterior criterion (MPC) to select the combination that has the 
largest posterior probability. 

We are interested in examining the asymptotic properties of the model/ 
moment combinations proposed by the maximum posterior. We hope that 
the maximum posterior criterion will produce a desirable combination in 
the following three senses. First, asymptotically, it should be true. Second, 
it is desirable that it should contain as many moment inequalities as pos- 
sible since, intuitively, the more moment inequalities we have, the smaller 
the identified region is and hence we have more information about the true 
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parameter. Finally, the model should be as simple as possible, that is, the 
parameter subspace should have the smallest dimension. We show in Section 
4 that, indeed, with suitable specifications on the prior, the maximum pos- 
terior criterion can produce such a desirable combination with probability 
tending to one as the sample size increases to infinity. Such a result will be 
referred to as the consistency of the MFC for model/moment selection. 

The remainder of this paper is organized as follows. Section 2 describes 
a general moment inequality model and the construction of the limited in- 
formation likelihood. We also provide a general consistency theorem on set 
estimation based on the posterior c.d.f. function. Section 3 provides a de- 
tailed large-sample analysis of the behavior of the posterior distribution. In 
particular, we will derive the convergence rates both within the interior of 

and on any (5-contraction outside 0. Section 4 studies the problem of mo- 
ment/model selection. Section 5 displays some simulation results. Finally, 
Section 6 concludes with a discussion. Froofs are given in Appendices A-C. 

2. Moment inequality models. 

2.1. Limited information likelihood. Suppose that for G R'^, we have 
moment inequality conditions 



Here, 9 is the structural parameter of interest, for example, 9 = EY , the 
mean of the unobserved random variable Y in Examples 1.1 and 1.2, and A 
is the bias parameter of Em{X,9), for example, A = {EY2 — 9,9 — EYi)^ in 
Example 1.1. Let be the true parameter value of 9 and Aq be the true bias 
parameter when 9 = 9o. Suppose that the prior of 9o is supported on a large 
enough compact set that contains the identified region. We are interested in 
constructing the marginal posterior for ^q- 

In addition, let m{9) = ^Y^'l=im{Xi,9) and G{9,X) = m{9) - A. Then, 
after the bias parameter A is introduced, G can be considered as the "de- 
biased" sample moment. In other words, G is an estimating function with 
EG{9, A) = 0. It is overparametrized, meaning that the dimension of {9, A) is 
greater than the dimension of G, and, hence, we cannot consistently estimate 
^0 by solving G{9, A) = directly. 

Under some regularity conditions, by the central limit theorem. 



EmjiXi,9)>0, 




Let 



m{X,9) = (mi iX,9),m2{X,9),..., mp{X, 9) )^. 




(2.2) 
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where Vq = Var(m(X, ^o))- We can therefore formahy construct a "hkeh- 
hood" function: 

(2.3) p{X^\e,X) = ^ g-n./2G(g,A)^y-^G(g,A)_ 

y det(27r/nVo) 

Note that for 9 ^ 9q, (2.2) is not true in general. In fact, we cannot find a 
A E [0,oo)P such that Em{X,e) = A for 6* ^ fi. Hence, (2.3) is not the large- 
sample conditional p.d.f. of G for general {9,X). The asymptotic result (2.2) 
alone would not allow us to derive a likelihood function over the entire x 
[0, oo)P. To solve this problem, Kim (2002) introduced the concept of limited 
information likelihood. For each parameter 9 £ Q, although (2.3) may not 
be the true probability density of X", it is shown to be proportional to the 
density that is closest to the true density in the Kullback-Leibler distance, 
among a family of densities satisfying the moment condition EG{9,X) = 0. 
The "likelihood" in (2.3) is therefore the limited information likelihood of 
9 and A, which is the best approximation to the true density that satisfies 
the moment restrictions. The concept of the Kullback-Leibler information 
distance and applications of it can be found in a number of works, such as 
Cover and Thomas (1991) and Zellner (1994). 

Let p{X) be the marginal prior of A. Assume that A and 9 are independent, 
that is, the conditional prior of A given 9 is equal to the marginal prior of 
A. Since we are only interested in 9, we thus integrate out A to obtain the 
limited information likelihood function for 9: 

L{9)=p{X'^\9) 



(2.4) = / piX^\9,X)p{X\9)a 

p{X''\9,X)p{X)dX 



'[0,oo)P 

The fact that A is a location parameter of (2.3) makes the problems solvable. 
This will be described in detail in Section 3.1. 

In practice, the asymptotic variance Vq in (2.3) is not known, but it can 
be shown to have very little influence on the inference about 9 in the cur- 
rent situation of partially identified moment inequality models. In future 
expositions, we will replace Vq by a prespecified nonsingular matrix V and 
show that L{9) has good (and very similar) frequentist properties for infer- 
ence on 9, whatever V is chosen. (A more delicate treatment would be to 
approximate Vq by a sample analog and replace the true parameter in ^ 
by the unknown argument 9. This will be left for future work. We expect 
that similar techniques will lead to similar results in this treatment, but the 
technical details can be much more complicated.) 
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2.2. A general result on the posterior set estimation. We first define some 
notation that will be used subsequently. Throughout this paper, let A'^ and 
int(j4) denote the complement and interior of a set A, respectively. In ad- 
dition, following CHT's notation, V(5 > 0, let (Q'^)"'' be the (5-contraction of 

{n'^y^ = {6€e:d{6,n)>6}. 

Let B{u}, r) denote an open ball around lo : B{u}, r) = {9 : d{u}, 9) < (5}, where 
d{uj,9) denotes the Euclidean distance between oj,9. Let dH{A,B) denote 
the Hausdorff distance between sets A and B: 

dH{A,B) = max< sup d{a, B), sup d{A,b) >, 
^a£A beB ^ 

where d{a,B) = inf^g^ (i(a, 6). We say that a set estimator An consistently 
estimates il. if 

dniAn, ri) — )• in probability. 

Moreover, for two sequences {onl^i and {bn}^=i, we write a„ >- bn if 
— ?• oo. Finally, we write "w.p.a.l" as shorthand for "with probability 

approaching one in the probability distribution of as n — )■ oo." 

Let p{9) be the prior of 9. By Bayes' rule, the posterior of 9 then satisfies 

(2.5) p{9\X'')ocp{9)L{9). 

It is desirable for the posterior to possess some "good" frequentist prop- 
erties. Roughly speaking, we want to see that the posterior density of 9 
concentrates near Q and drops dramatically to zero outside with a high 
probability as n increases. The significant difference of such asymptotic be- 
havior between the inside and outside of the identified region implies that 
the resulting posterior has the capability to produce consistent set estima- 
tion for Such a relation between a "good" posterior and its capability 
to estimate is demonstrated below for a scalar function of ^2. (A more 
general estimation of Q itself will also be discussed in Section 3.) 

The posterior probability that 9 belongs to a set A is 

P(0gA|X")= / p{9\X'')d9. 
J A 

Definition 2.1 (Dense). A subset ^4 c is said to be dense in Vt if 
Vw G \ A and any neighborhood Uw of w, we have U^r] A^ (j). 

An equivalent definition of dense subsets in real analysis is that the closure 
of A is fi, that is, c\{A) = VI. We will consider the large-sample behavior of 
the posterior distribution on a dense subset of 
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Suppose that, instead of 9, we are interested in the functions of 9, g{9), 
where (7 : — t- M is some known continuous mapping. For instance, if we are 
interested in the ith. component of 9, then g{9) = 9i. Let g{fi) = {g{9) : 9 € 
0}, the image of g. We are interested in estimating g{il,) directly. We impose 
the following assumptions. 

Assumption 2.1. 6 is compact. 

Assumption 2.2. J7 is compact and connected. 

In moment inequality models, the compactness of il. follows from assuming 
Emj{X, •) : — )• M to be continuous for each j. We assume Cl to be connected 
so that the intermediate value theorem on a topological space holds. 

Assumption 2.3. g-.Q is continuous on 6. 

The estimation of g{^) to be constructed is based on the inverted posterior 
c.d.f. of g{9). Let Fg{x) = P{g{9) < the posterior c.d.f. of gf^). Let 

F-\y)=mf{x:Fgix)>y}. 

Then x > F~^{y) if and only if Fg{x) > y. The following theorem provides a 
general consistency result of a set estimator of g{^) based on the posterior 
c.d.f. Note that since it can be shown that g{0,) = [infQ^^Q g{9) , supq^q g{9)], 
one might think that a more natural set estimator can be constructed by 
finding estimators for the end points of the interval g{^)- This idea works, 
for example, in Example 1.1, where [i?yi,£'l2] can be estimated by [Yi,!^], 
as both Yi and I2 are observable. However, in a more general setting, esti- 
mating the end points mig^fig{9) and snpg^Q g{9) would require estimating 

0. first. The estimator proposed in the following theorem provides a way of 
estimating the interval directly. 

Theorem 2.1. Under Assumptions 2.1-2.3, assume that there exists 
{'^n\'^=i, TTn — ^ 0, such that: 

1. V5 > 0, P{9 G = Op(7r„); 

2. there exists a dense subset AdVt such that Voj E A, 3Rw > such that 
whenp<Rw, P{9 e B{uj, p)\X"-) y TTn w.p.a.l. 

If we let g = [Fg'^{7rn),F~^{l - 7r„)], then 

dnig, gi^)) —^0 in probability. 

Remark 2.1. 
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1. The consistent set estimator depends on the choice of 7r„. However, we 
do not pursue an operational way of constructing the estimator based 
on the posterior distribution in this paper because there are many fre- 
quentist methods to achieve this purpose, for instance, CHT, Beresteanu 
and Mohnari (2008), etc. This paper is more focused on the posterior 
distribution itself. The purpose of this theorem is to demonstrate that 
the posterior can be used to consistently estimate the identified region, if 
needed. The posterior distribution can actually provide more information 
than the identified region when taking into account the prior. 

2. We can also provide an exact credible region (based on, say, setting 7r„ = 
0.025, for instance) for the true parameter, conditional on the observed 
data. This is parallel to the provision of the confidence intervals with 
required coverage probabilities in the frequentist approaches of Imbens 
and Manski (2004), Rosen (2008), etc. 

3. It is possible to get an optimal rate of 7r„ for optimal convergence rate in 
Hausdorff distance. We leave this for future work. 

In the next section, we will see that, under some regularity conditions, the 
posterior distribution of 6 satisfies conditions 1 and 2 of this theorem, which 
describe the frequentist properties of the posterior. In addition, we will also 
propose a consistent estimator for fi, directly based on the log-posterior 
density. 

3. Posterior properties of moment inequality models. In this section, 
we assume that the identified region contains a nonempty interior int(O). 
Assuming it is dense in 0, it is then of interest to study the asymptotic 
properties of the posterior distribution inside int(r2). 

3.1. The posterior density. Following the discussions in Section 2.1, we 
will study a limited information likelihood for 9 defined by 

(3.1) m= [ 1 ^-n/2iMe)-x)^v-HMe)-x)^m^^ 

J[o,oo)p Vdet(27ry/n) 

where V is some preselected positive definite matrix that does not depend 
on 9. We will use a multivariate exponential distribution as the prior on A 
throughout this paper: 

where i{j is prespecified. We use the exponential prior for ease of integration 
over A. More general choices of p{X) may not allow the integration to be car- 
ried out analytically, but the large-sample behavior of the posterior should 
remain unchanged. 
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Let Zq be a p-dimensional multivariate normal random vector with mean 
{fh{6) — ■^) and variance-covariance matrix ^. A straightforward calcula- 
tion of (3.1) then leads to 



and we have p{e\X'') oc p{e)L{e). 

For large values of n, by the uniform weak law of large numbers, rh{9) is 
bounded on 6 w.p.a.l. Thus, for fixed ip and V, e-V''^'^(^)+V(2n)V^Fi^(]-[P^^ 
is bounded away from zero and infinity. Therefore, the only term that char- 
acterizes the large-sample properties of the posterior should be P{Zg > 
0). Moreover, the variance-covariance matrix of Z has order Op(n~^), so 
we would expect that lim^^o PiZg > 0) = 1 in probability if and only if 
rh{6) — > w.p.a.l. This depends on whether or not 9 belongs to Q. For 
large n, the posterior density is positive inside O and drops to zero exponen- 
tially fast as 9 goes away from 0,. We will formally examine these asymptotic 
properties in the next section and will also derive the convergence rate of 
the posterior probabilities. 

3.2. Large-sample analysis. We now provide a large-sample analysis of 
the posterior distribution of the parameter 9. 

Assumption 3.1. int(O) is nonempty and is dense in 0. 

The assumption that int(r2) is dense in can be restated as follows: for 
any uj on the boundary of 0, and any neighborhood 17^, of to, Uyj contains 
points in int(O). Most of the identified regions characterized by moment 
inequalities possess such a property. We will comment on the case when 
int(il) is empty in the discussion section. 

Assumption 3.2. Emj{X, •) : 6 — ;> M is continuous for each j = 1,. . . ,p. 

This assumption guarantees that Em{X, 9) is bounded in any compact 
set and that the uniform law of large number holds. The next assumption 
puts a regularity condition on the prior of 9. 

Assumption 3.3. p{9) is continuous and bounded away from zero and 
infinity on Vt. 

Let Vjj be the jth diagonal element of V . We can write 



(3.2) 
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For any 6 > 0, let 

. Emj{X,e) 1 
As = <9: mm ' <-b\. 

Apparently, Af^d^^^ . 

Lemma 3.1. Under Assumptions 2.1, 2.2 and 3.2, if there exists 
some a„ such that V5 > 0, P{9 G As\X'^) = Op{an), then Ve > 0, P{6 G 
(f]^)-^|X") = Op(a„). 

Theorem 3.1. Under Assumptions 2.1, 2.2 and 3.1-3.3: 

1. V(5 > 0, for some a > 0, 

2. V nonempty open sets r.(ZQ., in probability, 

liminfP((9GH|X") >0. 

n— >oo 

Hence, we are able to distinguish the asymptotic behavior of the pos- 
terior: for large values of n, the posterior density is only supported on a 
neighborhood of the identified region and the posterior distribution drops 
to zero exponentially fast on any subset that is separated from Q. Based 
on these findings, we can construct consistent estimators for both and its 
continuous mappings. For the latter task, we can now apply Theorem 2.1. 
Suppose that g{-) is a continuous real-valued function on and let F~^{y) 
be the y-quantile of the posterior c.d.f. of g{6). 

Theorem 3.2. Under Assumptions 2.1-2.3 and 3.1-3.3, for any se- 
quence TTn = Op{l) such that Va > 0, e~"'"/7r„ — t- 0, we have 

dj/([F-^(7r„),F~^(l - TTn)],g{^)) ^0 in probability. 

We can also consistently estimate O directly using the posterior density 
function. The consistency is based on the fact that the posterior density 
attains its peak inside 0, and is asymptotically supported on the entire iden- 
tified region. In addition, it drops to zero outside at an exponential rate. 
Therefore, by properly choosing a cut-off value the region where the log- 
posterior p.d.f. exceeds its peak minus should eventually converge to Q. 

Theorem 3.3. Under Assumptions 2.1-2.3 and 3.1-3.3, let 1 ^ e„ ^ n. 
If we define 

An = {e:maxlnp{uj\X'')-lnp{e\X'')<en}, 
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then 

dniAn, 0) — )• in probability. 

Remark 3.1. The estimation established in Theorem 3.3 is easy to im- 
plement for the following reasons. 

1. Note that 

maxlnp(a;|X") - lnp(6l|X'') 

= max( lnp(tj)L(w) -In / p{9)L{9)de] 

- [\np{e)L{d)-hi j^p{9)L{9)de^ 

= maxlnp(a;)L(a;) — \np(9)L(6). 

Therefore, there is no need to normalize p(9)L(9), avoiding numerical 
integration oi p{9)L(9). 

2. Maximizing lnp{9)L{9) is computationally workable since the maxima is 
attained only inside fi, where p{9)L{9) is quite smooth. Hence, Newton- 
Raphson's algorithm can carry out the maximization. 

3. If we set a„, = max^^ge lnp(a;|X") — e„,, then = {9 : lnp{9) > an}. The 
boundary {9 : lnp{9) — a„ = 0} is a closed curve with dimension d — 1. 



4. Moment and model selection. In this section, we discuss the problem 
of moment and model selection. Suppose that we have p candidate moment 
inequalities 

Emj{X,9)>0, j = l,...,p, 

with a fc-dimensional parameter vector 9 = {9i, . . . , ^fc)"^ G Oi x • • • x 0^. The 
moment selection problem refers to selecting the best subset of the moment 
inequalities among all of the possible candidates (where there is some notion 
of optimality), while the model selection procedure addresses the problem 
of selecting the best model among all subsets of the parameter space where 
some components of the parameter are set to zero. The possible moment 
inequalities and corresponding subsets of the parameter space are known. 
What is not known is which ones are the best. 

Instead of selecting the moment inequalities and the parameter subspace 
as two separate procedures, we select them simultaneously, as a combina- 
tion. However, there are still two problems to consider: selecting the true 
combination and, among the true combinations, selecting the optimal one, 
in the sense that it should contain as many moment inequalities and as few 
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structural parameters as possible. The selection procedure is based on the 
maximum posterior criterion (MFC): we assign prior probabilities to each 
candidate moment /model and then derive the posterior probabilities based 
on the limited information likelihood described in Section 2, by integrating 
out the structural and nuisance parameters (^,A). Finally, the optimal com- 
bination is the one with the largest maximum posterior probability. We will 
examine the asymptotic property of the optimal combination by establish- 
ing the consistency of MFC. By consistency, we mean w.p.a.l, MFC will 
select the true combination with the most moment inequalities and fewest 
structural parameters (the simplest parameter subspace). 

4.1. Selecting the true combination. Because of the feature of partial 
identification, it is impossible to test whether each candidate moment in- 
equality is true at the true parameter. Given a set of moment inequalities 
and a parameter space, we can tell whether the moment inequalities define 
a nonempty identified region on the parameter space. 

Example 4.1 (Interval regression model). Suppose that an interval re- 
gression model provides moment inequalities as follows: 



We assume that the data-generating process is E{Y\X) = X 9q, where 
^0 = (0.9,0)"^ is the true parameter and Xi ~ uniform [— 1, 1], X2 = 1 a.s. 



Furthermore, let (^^) = (^\"''^) and (y^) = (y^g ' where Ui and U2 



are uniform [—1,1] independently. Oq is not partially identified by the mo- 
ment inequality models. If we let (^1,^2)^ S be the parameter vector, 
then we have four moment inequalities: 



The region defined by (4.1)-(4.4) on gives the nonempty identified re- 
gion for ^0 with a parallelogram boundary. However, if we set 9i = 0, (4.1) 
contradicts (4.4). Hence, in this case, (4.1)— (4.4) defines an empty region. 

Let us define a combination Cg = (Ms^,Gs2) with a vector index s = 
(si, S2), si G {1, 2, . . . , 2^ — 1} and S2 G {1, . . . , 2*^}. Here, M^^ denotes a sub- 
set of moments, for instance, M^^ = {nii}, Mg^ = {mi, 7712}, etc. There are 



E{Z^Yi) < E{ZiY) < E{ZiY2); 
E{Z2Yi) < E{Z2Y) < E{Z2Y2). 



(4.1) 
(4.2) 
(4.3) 
(4.4) 



i0i + 02 >0.2; 
\ei + 02 < 0.4; 

O2 > -0.1; 

62 < 0.1. 
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then 2^ — 1 such possible subsets. In addition, we denote by the param- 
eter subspace corresponding to the selected model. By definition, is the 
subset of vectors with one or more components fixed to be zero. There are 
2^ possible G^j's. (Note that we can select none of the parameters, in which 
case the model is a reduced model, e.g., in Cox's proportional hazard model; 
if all of the parameters are set to be zero, we get the baseline model.) The 
combination Cg combines both the candidate moment functions and the pa- 
rameter subspace. When selecting a subset of moment inequalities, we also 
specify a subspace of the structural parameter. 

Example 4.2 (Example 4.1 continued). Let ©i x ©2 be the parameter 
space for (^1,^2)5 chosen large enough so that {(^1,^2): 0.2 < ^61 + 6*2 < 
0.4, —0.1 < ^2 !^ 0.1} C Gi X ©2. A scope of candidate combinations can be 
any of the following: 

{E{ZiX^e-ZiYi)}, ©ix©2; 

{E{ZiX'^e - ZiYi),E{ZiY2- ZiX'^9)}, ©1 x ©2; 

{E{Z2X^e-Z2Yi)}, {0}x©2; 

{E{ZiX^e - ZiYi),E{ZiY2 - ZiX^e),E{Z2Y2 - Z2X'^e)}, ©1 X {0}; 



{^(^2^2- ^2^^^)}, ©1X©2. 

Definition 4.1. A combination Cs = (Ms^,©^^ is true if and only if 

inf \\EMsJX,e)-Xf = 0, 

eeesj.Aelo.oo)™ 

where m denotes the number of candidate moment functions in M^^ . 

If we let ns = {0& ©S2 : ^^si ^) > 0} be the identified region defined 
by Cs, then this definition is equivalent to saying that is not empty. 

We place a discrete prior p{Cs) on all of the candidate combinations. In 
practice, such a prior can be either uniform [i.e., p{Cs) = 2fc(2P-i) ^'^^ ^s] 
or model-dependent, or obtained from previous studies. As in the previous 
sections, let 

X = EMs,{X,e) 

with dim(A) = m and use the following prior conditional on Cs- 
p{^\Cs) = l[^l^^e-^'''\ A>0, 
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where the Vi's are the prespecified second-stage parameters. Let p{9\Cs) 
be the conditional prior of the parameter 9 € • The conditional limited 
information likelihood is given by 

L(X^\9,\,Cs) = 1 ,-n/2(A?.,(e)-A)ry-i(M.,(e)-A) 

^ I ' ' ^det(27r/ny) 

where Ms^{9) = ^Yli^=i-^si{Xi,9). The posterior of Cs can then be ob- 
tained by integrating out 9 and A, which is proportional to the "integrated 
likehhood," 



(4.5) p{Cs\X^)^ L{X''\9,X,Cs)p{e\Cs)p{X\Cs)p{Cs)dedX. 
J iesjxfo.oo)'" 

A remark on the "ti^" part of this integration: the integration is with re- 
spect to the nonzero elements of € , where 0^2 is the parameter space 
of those free parameters only. For instance, suppose that the full parameter 
is {9i,92,9z) G 01 X ©2 X 03. Once we set 0^ = 0, then 0^2 = 0i x 02 and 
integrating over 9 becomes a two-dimensional integration (w.r.t. ^i, ^2)- Oth- 
erwise, if we set 0^2 = ©i x 02 x {0} and still treat it as a three-dimensional 
integration, ©^2 would have a zero Lebesgue measure and, as a result, the 
integration would always be zero. 

We make the following assumptions. 



Assumption 4.1. The parameter space © of the full model is compact. 



The next assumption is imposed on the prior of 9. 



Assumption 4.2. If Cs is true, then p{9 G ^s\Cs) > 0. 
Assumption 4.3. p{Cs) > for each combination Cg- 



The following assumption implies that is nonempty and compact, given 
that Cs is true. 



Assumption 4.4. Emj{X,9) is continuous on ©^ for each mj in Cs- 

Intuitively, we should select as many moment inequalities as possible since 
the more moment inequalities there are, the smaller the identified region 
is. However, if one or more of the selected moment inequalities are false, 
the identified region is empty. The following theorem illustrates that the 
posterior probability is exponentially small if the selected combination is 
false. 
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Theorem 4.1. For combination = (M^^ , G^j), under Assumptions 
4.1-4.4: 

1. if Cs is true, then, in probability 

liminfp(C7,|X") >0; 

2. if Cs is not true, then for some a > 0, 

p{Cs\X") = Oj,{e~np{Cs). 

4.2. Selecting the optimal combination. The maximum posterior proce- 
dure provides an optimality criterion to select the combination with the 
largest posterior probability 

(4.6) C* = argmaxp(C,|X"). 

We are interested in studying the asymptotic properties of the optimal 
C* . We hope that the MPC will produce a desirable combination in the 
following three senses. First, asymptotically, C* should be true. Second, it 
is desirable that it should contain as many moment inequalities as possible 
since, intuitively, the more moment inequalities we have, the smaller the 
identified region is and hence we have more information about the true 
parameter. Finally, the model should be as simple as possible. 

We impose the following assumption in addition to Assumptions 4.1-4.4. 



Assumption 4.5. Each true candidate combination Cg defines a connect- 
ed Qs- 

We first consider using a (discrete) uniform prior for the candidate com- 
binations: for all Cs, 

(4-7) p^^^^ = ww^y 

Although this seems to be a natural prior to use, we found examples 
where it actually functions undesirably for model or moment selection. For 
example, suppose that we want to compare the posterior probabilities of 
two candidate combinations, C^ = {Mg_^,Ql^) and C^ = {M^^jQ"^^), using 
the Bayes factor 

p{Cl\X^) 



BF 



12 



p(C||X")- 

We fix = ^'^d assume M^_^ C M^^, that is, the moment inequalities 
of are strictly contained in the moment inequalities of M^^. If both CI, 
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i = 1,2, are true, then the identified region Q'^ defined by should be a 
strict subset of ^l] defined by C^. As explained before, a smaller identified 
region is preferable since it provides more precise information about the 
true parameter. Hence, we hope that BF12 is asymptotically less than one if 
the MPC criterion is consistent with this intuition. However, the following 
theorem indicates that, if the uniform prior (4.7) is specified, then the result 
is in exactly the opposite direction. 

For a matrix M, define ||-ff || = trace{HH^). 

Theorem 4.2. Suppose that Assumptions 4-l~4-5 '^'^e satisfied and a 
uniform prior (4-7) is applied. Suppose that both CI, i = 1,2, are true, 
C and Q]^ = In addition, suppose that ipi satisfies 

^. <e-||^l|-supe||£m{x,e)||^ i = l,...,p. Then, w.p.a.l, 

BF12 > 1. 

Here, ipi is the second-stage parameter of the exponential prior of Aj. 
In practice, a small ip is preferable because it leads to a noninformative 
prior on A. However, Theorem 4.2 says that if ip is small (satisfying tpi < 
g-||i/i||-supQ \\Em{x,e)\\ ^ J = 1^ . . . ^p) and a uniform prior for the candidate com- 
binations is used, then the result will be negative: a candidate with fewer 
moment constraints has a larger posterior distribution. However, this is not 
a warning about the method, but rather about the potential danger of the 
seemingly innocent choice of the uniform prior on the candidate combina- 
tions. With this prior, it will be shown in Appendix C that the posterior 
of each true combination is of order Op{l) and, up to the leading order, 
is proportional to the prior measure of the identified region, as well as the 
product of the ipi^s. As more moment inequalities are added, the identified 
region gets smaller. Also, more small ipi^s are added in the product term. 
Both of these factors make the resulting posterior probability smaller. 

We will address this problem either by using a more informative prior on 
the candidate combinations (Approach 1) or by placing some uninformative 
priors on some components of the parameters 9 and A (Approach 2). Either 
way, the posterior of each candidate is no longer of order Op{l) and the order 
of the optimal candidate's posterior will be the largest, which overrides the 
effects from the prior measure of the identified region and the product of 
the ■04 's. 

4.3. Prior selection for consistency of MPC. We propose two approaches 
to address the problem illustrated in Theorem 4.2. 

Approach 1. One approach is to change the priors of all the candidate 
combinations. Instead of the uniform (equally likely) priors, we use unequal 
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priors. Specifically, the priors are data-size-dependent and tend to encourage 
those combinations with more moment inequalities and simpler parameter 
structures. One such prior could be 

(4.8) p{Cs) OC n'^[<iMMs)-dim(e,)] 

for some a > 0. This choice of prior encourages Cg with large dim{Ms) — 
dim(e,). 

One needs be aware that although p{Cs) oc e"(dim(Afs)-dim(es) ^jg^ reward 
large values of dim(Ms) — dim(0s), we do not recommend its use. This is 
because we have shown earlier that p{Cs\X'^) = Op{e~°''^)p{Cs), that is, the 
posterior probability of a false combination is exponentially small. However, 
if an exponentially large prior is used, it may override the "big gap" between 
the false and true combination posteriors. 

The drawback of the unequal prior (4.8) is that it is not a uniform one. 
In Bayesian analysis, it is usually the nature of the data that determines the 
properties of the posterior and the priors are usually chosen to be uninfor- 
mative. One may consider using another approach to deriving the priors. 

Approach 2 . In this approach, we still use the discrete uniform (equally 
likely) prior for the candidate combinations. However, we partition the pa- 
rameters 6 and A into "restricted" and "unrestricted" parts, according to 
the biases of the selected and unselected moment functions. Formally, let 

X = EM{X,e), 

where M{X,9) = {mi{X,6), . . . ,mp{X,6))'^, the vector of all the candidate 
moments, and 9 = {6i, . . . , 6k)^ , the vector of full parameters supported on 
Oi X • • • X Ofc. Suppose that a combination Cg = {Ms-^ , 6^2) selects m moment 
conditions Mg^ and leaves the rest of the moments (denoted by M^^ ) unused, 
while selecting a submodel parameterized by Og G @S2 ; setting all of the other 
components of 9 (denoted by 6^) to be zero. One can view model selection as 
placing a restriction on 6, while moment selection can be viewed as placing 
a restriction on the bias A. 

Let As be the subvector of A corresponding to the selected moments. Let 
A^ denote the remaining components of A corresponding to M^^. We then 
have 

EMs,{X,es) = Xs, Xs>0, 

EM^^{X,es) = X% xieRp-"". 

The bias A^ for the selected moments is restricted to be nonnegative, while 
the bias A^ for the unselected moments is left unrestricted. Therefore, we 
can define restricted parameters as (As,0^), with restrictions 



Xs > 0, 6'', = 0. 
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In addition, we call the remaining parameters (A^,^^) unrestricted parame- 
ters because (A^,^^) € M^"™ x Q^^. [We have thus partitioned the moment 
functions into M{X,es) = {Ms^iX^Os)'^ ,M^^{X,es)'^)'^ and parameters into 

A = (A„A^) a.ude = {es,ei).] 

For the unrestricted (selected) parameter Og, let t = dim(^s). We release 
the compactness assumption on the support of 6s and assume it is supported 
on M*. We then place the following "working" priors on the unrestricted 
parameters: 

(4.9) p{K\Cs) ~ iVp-m(0, <y%_m), 

(4.10) p{es\Cs)^Nt{Q,nallt), 

where Nt denotes t-dimensional multivariate normal distribution. We require 
that (7,1 — )• oo as n tends to infinity, but an/e°^'^ — )■ 0, Va > 0. Here, It denotes 
the t xt identity matrix. Since the variance of each component of A^ and Og 
approaches infinity as the sample size tends to infinity, (4.9) and (4.10) tend 
to be very flat. Hence, this choice of prior is uninformative. In addition, we 
still assign an exponential prior to the restricted parameter Ag, 

We then include both selected and unselected to construct the 
limited information likelihood, which depends only on the unrestricted Og 
since = 0: 

^ I ^' ' y/det(27r/ny) 

where M{6s) = ^ Yl^=i ^{^i^^s)- The posterior of Cs can then be obtained 
by integrating out 9s and A = (Xj , Xf")'^ , which is proportional to the "in- 
tegrated hkelihood": 

p(C7,|X")oc / / L{X^\9s,X,Cs)p{es\Cs)p{Xs\Cs)p{X's\Cs) 

J Jes2x[0,oo)'"xIRP-™ 

xp{Cs)des dXs dX's. 

Note that since multivariate normal priors are placed on the unrestricted 
parameters, the parameter vector 6s is no longer supported on a compact set. 
As a result, to derive the large-sample properties of p(Cs|X"') becomes much 
harder than in the previous sections because EM{X, 6) may not be bounded 
on the noncompact parameter space. Instead of providing a general proof, 
we will study the problem for a specific model of Example 1.3 because this is 
the most interesting example in the Introduction where we consider model 
and moment selection. In this example, the model selection can correspond 
to selecting the useful explanatory variables and the moment selection can 
correspond to selecting the valid instrumental variables. The key feature of 
this example is that the moment inequality functions are linearly dependent 
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on 6. We point out that to establish consistency of Approach 2 in a more 
general framework is possible, but would require additional assumptions that 
are much more technical. 

Assumption 4.6. Suppose that the moment inequalities are given by 

EZ{Y2 - x'^e) > 0, EZ{x'^e - Yi) > o, 

where Yi <Y2, and Z is a vector of positive random variables. Assume that: 

(i) j:ank{EZX^) = dim{X); 

(ii) there exists at least one true candidate combination; 

(iii) each true candidate Cs defines a compact identified region. 

This assumption rules out those candidates that lead to unbounded iden- 
tified regions, in which case integrals can be infinite. 

The following theorem shows that with either one of the two approaches 
described above, asymptotically, the optimal C* can have all of the desirable 
properties: it is true, it defines the smallest nonempty identified region and 
it corresponds to the simplest model (with the smallest number of free pa- 
rameters). We refer to this result as the consistency of maximum posterior 
criterion for the Bayesian moment/model selection problem. 

Theorem 4.3 (Consistency of MFC). Let 

C* = argmaxp(C7,|X'"), 

Cs 

where p{Cs\X"') is obtained from either one of the following: 

1. prior (4-8) for candidate combinations, with Assumptions 4- 1^4-5! 

2. flat prior for candidate combinations, and parameter priors (4-9) and 
(4-10), with Assumptions ^.^-^.6 for the instrumental variable interval 
regression model (Example 1.3). 

Then, w.p.a.l, C* satisfies: 

1. it is true; 

2. among all of the true combinations, it has the largest dim(Ms) — dim(0jj). 

5. Monte Carlo experiments. This section presents some Monte Carlo 
simulation results. We first provide evidence for the finite-sample behaviors 
of the consistent estimators described in the previous sections as well as the 
posterior distribution. We simulate the models described in Examples 1 and 
2 in Chernozhukov, Hong and Tamer (2007). We then show simulated evi- 
dence of the consistency of MFC for the moment/model selection problem. 
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Table 1 

Estimation based on posterior density 



e-n. 




\/n 


Inn 


In Inn 


n — 


500 


[-0.2841,5.2634] 


[-0.123,5.113] 


[-0.0389,4.702] 


n = 


1000 


[-0.2362,5.2267] 


[-0.1135,5.0977] 


[-0.0342,4.9110] 


n = 


5000 


[-0.1158,5.1233] 


[-0.0477,5.0476] 


[-0.0202,4.9779] 



Example 5.1 (Interval data). Consider Example 1.1 described in Sec- 
tion 2. The parameter of interest is 6 = E{Y) with moment inequalities 

E{Y2 -9)>0, E{e - Yi) > 0. 

We set Yi ~ A^(0,0.1) and Y2 ~ iV(5,0.1), then Q = [0,5]. Yi and Y2 are 
generated independently and observations with Yi > Y2 are discarded. We 
also set Vi = 0.1, ip2 = 0.5 and V = I, the identity matrix in the likelihood 
function. In addition, we place a flat prior on 9. We report the estimated 
identified interval of 9 described both in Theorem 3.2 with g(9) = 9 and in 
Theorem 3.3 for sample sizes = 500, 1000, 5000 and various choices of 

Table 1 reports the estimation of S7 as in Theorem 3.3. To compare the 
results corresponding to the choices of En, for each interval [a, b], we calculate 
7 = (a — 0)^ + (6 — 5)^. We find that e = Inlnn performs better than the other 
two choices since it has a lower 7 value. 

To construct the estimator based on the posterior c.d.f., we carry out 
the Metropolis algorithm to draw B = 5000 samples from the posterior dis- 
tribution, then calculate the 7r„-quantile of the empirical c.d.f. with vari- 
ous choices of 7r„. For the Metropolis algorithm, we set initial value 9^ = 1 
and a jump distribution 9 ^ N{9j,0.5). Table 2 reports the findings with 
TTn = e~^,n~^ and 1/lnn. TTn = ^ appears to be a better choice compared 
with other two. We also note that 7r„ = 1/lnn tends to zero too slow to 
fully estimate the entire identified interval: the estimated interval shrinks 
too much inside fi. 



Table 2 

Estimation based on empirical c.d.f. 









1^ 

n 


1 

In n 


n = 


500 


[-0.0716,5.0418] 


[-0.0498,5.0069] 


[0.4048,3.3447] 


n = 


1000 


[-0.0422,4.9983] 


[-0.0383,5.0164] 


[0.3304,3.2542] 


n = 


5000 


[-0.0155,5.0098] 


[-0.0063,4.9927] 


[0.2717,3.8012] 
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flat prior prior: N(0, 0.25) 
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Fig. 1. Example 5.1: the posterior density function of 9. 



In addition, Figure 1 plots the posterior density function of 6 with two 
choices of priors: a flat prior and an A^(0,0.25) prior. Theoretically, one 
needs to truncate the normal distribution so that the priors are supported 
on a compact set. However, since the tail of the normal density function 
is very thin and we can choose a very large parameter space, we believe a 
normal prior is workable here. We see that when a flat prior is used, the 
posterior density function is high on the entire identified interval [0,5], but 
when the prior is set to be A^(0,0.25), most posterior mass falls in [0,2], 
which tends to underestimate the true identified interval. However, with 
this more informative prior, the posterior provides more information about 
the location of 6. 

Example 5.2 (Interval outcomes in regression models). We simulate 
the instrumental inequality model described in Example 1.3, 

E{ZYi) < E{ZX^)e < E{ZY2), 

where 9 = {61,62^ ,X = (Xi,X2)^ and Y = (^1,12)^ G K^- Generate X ~ 
iV2((l, l)^,/2). Let Zi = +X2 and Z2 = X1 + 2X2. Generate ~ A^(3, 0.1), 
I2 ~ -^^(6, 0.1) independently. We discard a stack of generated data if either 
Zi or Z2 is negative. The identified region is fO. = {0 : 2 < ^1 + ^2 ^ 4, 9 < 
4^1 + 5^2 ^ 18}, a two-dimensional region with parallelogram boundary. To 
estimate this model, set ijj = (0.1,0.1,0.5,0.5)-^, V = 1. Fixing sample size 
n = 500, we implement the Metropolis algorithm to draw B = 5000 samples 
from the posterior distribution. 

We first put a flat prior on 9. Figure 2 (left) displays the parallelogram 
boundary of $7, as well as 5000 draws from the posterior distribution. Most 
of the draws fall uniformly inside the identified set, except for those close 
to the two opposite angles of the parallelogram. We can see that there is a 
small "bias" at the boundaries. 
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Fig. 2. Example 5.2: the identified set and MCMC draws. Left: flat prior; right: prior 
(5.1). 

In order to show that when a more informative prior is apphed, the pos- 
terior distribution indeed provides more information about the location of 
the true parameter inside the identified region, we repeat the same MCMC 
procedure, but with prior distribution 

(5.1) ^i~Af(10,122), ^2~iV(-6,122), 

where 9i and 62 are a priori independent. This prior can be used when, for 
instance, a previous study estimates that EOi ~ 10 and E92 ~ —6, with the 
same standard deviation, 12. Figure 2 (right) displays 5000 MCMC draws 
from the posterior derived from prior (5.1). We see that the draws mostly 
concentrate at the lower-right corner inside the identified region, which is 
close to (10, —6), showing that our Bayesian approach indeed provides more 
information on 9 in this case than the frequentist method; the latter would 
only estimate the identified region and provide a confidence set, but not tell 
how 9 is distributed inside it. 

Example 5.3 (Moment selection: interval censored data). Suppose 9 G 
C M. We consider four moment conditions: 



(5.2) EYi > 9; 

(5.3) EY2 < 9; 

(5.4) EY3 < 9; 

(5.5) EY4 > 9. 



If we assume that G = [0, 10] and EYi < < EY2 < EY3, < EY4 < 10, then for 
fixed 0, (5.2) is incorrect. We generate i.i.d. data from Yi ~A^(— 1,0.1), 
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Posterior probabilities, (t„ 



Table 3 
: n. Set of true moments 



{(5.3), (54), (5.5)} 



Moments (5.3), (5.4), (5.5) (5.3), (5.4) (5.3), (5.5) (5.4), (5.5) (5.3) (5.4) (5.5) 



n= 100 
n = 1000 
n = 5000 



0.0076 
0.0546 
0.1645 



0.0863 
0.1979 



0.2711 



0.0398 
0.0893 
0.1230 



0.0222 
0.0502 
0.0677 



0.3564 



0.2568 



0.1580 



0.3304 0.1572 
0.2387 0.1125 
0.1466 0.0691 



^2 ~ A^(l,0.1), y3~iV(2,0.1) and Kt ~ iV(3, 0.1), with iV = 100, 1000 and 
5000. We fix 6* G © and use the prior described in Section 4.3, Approach 2 to 
construct the posterior probabihties for 2^ — 1 = 15 candidate combinations 
of moments. We expect to see that each combination, inchiding (5.2), should 
have a posterior close to zero for large N and combination [(5.3), (5.4), (5.5)] 
should have the highest posterior probability. 

The simulation result shows that if o"^ = n, then although the posterior of 
combinations including (5.2) goes to zero quickly, the posterior probability 
of [(5.3), (5.4), (5.5)] is still quite small, even with N = 5000, and is not the 
largest one among other true combinations (Table 3). Hence, the choice cr^ = 
n is too conservative. However, when o"^ = n^, the simulation result is exactly 
as expected. For = 1000 and 5000, the combination [(5.3), (5.4), (5.5)] has 
the largest posterior probability (Table 4). The combinations not listed are 
those including (5.2). They all have almost zero posterior, as desired. 



6. Discussion. In this paper, we assume that the interior of the identified 
region int(r2) is not empty. The case when int(Q) is empty is more compli- 
cated since there is no open set contained by O. When O has no interior, 
moment inequality models may contain exact moment conditions: 



Emij{X, 



70 



>0, 
o) = 0. 



Em2j{X, 

The identified region is then defined by 

n = {e:Emi{X,e) > 0,Em2{X,t 



0}. 



Table 4 

Posterior probabilities, a^ — n^. Set of true moments = {(5.3), (5.4), (5.5)} 



Moments (5.3), (5.4), (5.5) (5.3), (5.4) (5.3), (5.5) (5.4), (5.5) (5.3) (5.4) (5.5) 



n = 100 
n = 1000 
n = 5000 



0.2344 



0.8286 



0.9615 



0.2879 



0.0952 
0.0223 



0.1290 0.0682 0.1192 0.1104 0.0509 
0.0428 0.0241 0.0039 0.0036 0.0017 
0.0101 0.0056 0.0002 0.0002 0.0001 
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One of the problems one needs to take into account when considering the 
asymptotic behaviors of the posterior distribution is that fl has zero Lebesgue 
measure, due to the loss of dimensionality. Thus, integrating over Q always 
produces zero. For reasons of brevity, we do not provide a detailed discussion 
of this case. We point out that in this case, a dense subset in 0, still plays 
an important role in characterizing the large-sample behaviors of the pos- 
terior distribution. Define r, = {6 £ Q:Emi{X,6) > 0}. By assuming that 
H is dense in ^2, it can still be shown that there is a large "gap" between 
the large-sample posterior behavior inside and outside the identified region. 
Inside S, instead of being bounded below by a positive constant, we can 
show that the posterior density function is bounded by a polynomial rate. 
However, it still goes to zero exponentially fast outside the identified region. 
Interested readers are referred to our technical report Liao and Jiang (2008). 

In partially identified models, there are two different ways to make infer- 
ences: one is studying the identified region (including consistent estimation 
and constructing confidence regions), while the other is directly studying 
the true parameter. The simulation results demonstrate that when dealing 
with the first goal, a fiat prior is appropriate; to achieve the second goal, 
an informative prior is preferable. Hence, in this case, one should include as 
much information on the prior as possible. We believe our Bayesian method 
is more advantageous than the frequentist method when dealing with the 
second goal since the posterior distribution can provide more information 
about the inside of the identified region because of the prior distribution. 
The simulation results have verified our beliefs. 

Recently, Moon and Schorfheide (2009) have considered the Bayesian ap- 
proach to partially identified models when the model can involve three types 
of parameters: the structural parameters of interest 6, a reduced-form pa- 
rameter vector (j) that is point-identified by data and also a vector of auxiliary 
parameters a which links structural and reduced-form parameters via some 
known function 9 = 9{(j),a). For a particular value of <j), the auxiliary pa- 
rameter takes its value in some set Acf, and the identified set can then be 
written as 

e{^) = {9 = e{(l),a):aeA^}. 

After specifying a prior distribution for both (j) and a, and combining with 
a likelihood function of (j), a joint posterior of a and (j) is derived, which 
also determines the posterior of 9 via 9 = 9{(j),a). The authors also derive 
the Bayesian credible sets and compare them with frequentist confidence 
intervals for a number of particular models where 9{4>,a) is linear in (<?!>, a) 
and does not involve other functions of the unknown data distribution. How- 
ever, one of the main challenges of their approach is that it often requires 
reparametrizations between {9, a) and ((^,a). Initially, it is often more nat- 
ural to place a prior on the structural parameter 9 and a\9, but it may be 
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inconvenient to derive the distribution p{(j)) and p{a\(j)) from p(0, a). Another 
challenge is that in some models that define the relation 6{(p,a) implicitly, 
if dim((/)) > dim(0), it is nontrivial to specify a prior distribution p{(j)) and 
p{a\(f)) such that there is a solution 9 = 9((j),a). Also, if 9{(f>,a) involves an 
unknown distribution of the data-generating process, there is extra variance 
to account for when estimating it. 

In contrast to Moon and Schorfheide (2009), we proceed in a different 
framework of moment inequalities, one which does not require modelling 
the likelihood function. We construct the posterior distribution of the struc- 
tural parameter using the limited information likelihood and then study 
the frequentist properties of the posterior. In addition, we also study the 
problem of model/moment selection, which is not addressed by Moon and 
Schorfheide (2009). 

Based on the posterior distribution, we can, in principle, construct a cred- 
ible set for the true parameter conditional on the data with a required cov- 
erage probability using our method (this is beyond the scope of this paper, 
but it is straightforward, using the posterior density function). Moon and 
Schorfheide (2009) have derived a Bayesian credible set for the true pa- 
rameter and then compared it with the frequentist confidence interval and 
concluded that while frequentist confidence intervals usually extend beyond 
the boundaries of the identified set, the Bayesian credible sets tend to be 
located in the interior of the identified set. In the framework of this paper, 
it is also possible to derive a Bayesian credible set for the identified region 
if one can express the identified region explicitly in terms of 9 and A, an 
interesting topic for future work. 

APPENDIX A: PROOFS FOR SECTION 2 

A.l. Proof of Theorem 2.1. Let giQ)'" = {x G g{n) :d{x,g{Qy) > e}, 
g{n)+^ = {x€giQ):dix,gin))<e}. 

For all e > 0, we proceed in two steps: first, show 3N G N such that when 

n> A^, Ve>0, 

9{^)~' C g 

and then show 3N G N such that when n > N , \/e > 0, g C g{il.)'^^. 

Let inig{n) =mi0^ng{9) and supg{Q) = supggf^ 9(6'). 

Step I-l. Show that g{i^) = [ini g{Q),sup g{Q)]: obviously, g{i^) C [inf g{il.), 
sup5f(ri)]. On the other hand, Vx G [inf (7(il),sup5f(r2)], since is compact, 
36*1,6*2 G so that ^(^i) < x < 5(^2)- By assumptions, Q is connected and g 
is continuous. By the intermediate value theorem, 3^* £Q, x = g{9*). Hence, 
x£g{Q). 

Step 1-2. Show that 39* G A and a ball B{9*,R*) such that B{9*,R*) C 
{9 G @:g{9) < mf9^Qg{Qy^}: in fact, Ve > 0, it follows by step I-l that 
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g{Q,) ^ = [mfg{Q) + e, sup(7(0) — e]. Hence, mig^Qg{^}) ^ = inf g{Q) + e. 
Moreover, 39i G il., g{Oi) < mi g{Q) + e. By the continuity of g, there exists a 
ball B{6i,R) such that Vw G B{ei,R), g{uj) < inf g{VL)+e. Hence, B{9i,R) C 

{9ee:g{e)<mfe^n9m-'}. 

If 6*1 G A, then let 6* = Oi, R* = R. If 6'i G $7 \ A, since A is dense in 
il, i3(6'i, n A / Arbitrarily pick up an element 02 £ A D B{9i, §), 
V^GB(^2,|), then ^i) < ^2) + ^(^2, 6'i) < | + | < i?- Hence, ^G 
5(^1, ii). It follows that 5(^2,f) C 5(01, ii) C{eGe:5(^) <infegn5(^7)"^} 
and 02 G A. Let 0* = 62, R* = |. 

Step 1-3. Show that g{Q)~^ C g for large ?i: by assumption 2 of The- 
orem 2.1, for 0*, there exists Rg* and G N such that when p < Rq* 
and n>iV, P(6i G 5(r , > 7r„ w.p.a.L If we let i?i = min{i?0. , i?*}, 
then B{e*,Ri) C {6^ G G : 5(6*) < infee^ ^(O)-^}. Hence, when n > iV, Va; G 

Fg(rc) = Pig{9) < > 5(^(0) < inf A") 

>P(0GB(r,i?i)|X")>7r„. 

Hence, x > F~^{TTn). Likewise, we can show that x < F~^{1 — Tin). Therefore, 

Step II. Show for large n that g C g{Q)~^^: step I-l implies that g{Q)~^'^ = 
[inf 5(0) — e,sup5(r2) + e]. Vx G [g{Q,)'^^Y, either x < inf g{^}) — e or x > 
sup g{n) + e. If X < inf g{n) - e, then {0 G 6 : g{e) < x} C {0 G 6 : g{e) < 
inf g{i}) — e}. In addition, since g is continuous on 0, 3(5 > such that 
when d{9,n) < 6, g{e)> inf g{n)-e. Therefore, G {0 : g{e) < inf g{n) - e}, 
d{e, n) > 5, which implies that {9 : g{e) < inf g{n) -e}c {Q")-^ . By assump- 
tion 1 of Theorem 2.1, 3iV G N such that when n>N, P{e G (J7'=)~'^| A") < 
vTn w.p.a.l. It follows that 

P{g{6) < x|X") < P{g{6) < inig{n) - < P{6 G (!7'^)-''| A") < 7r„. 

Hence, x < F~^(7r„). If x > supg'(f]) -|-e, then, by a similar argument, we can 
show that X > F~^{1 - 7r„). Therefore, for n > TV, if x G [F~^(7r„), F~^(l - 
7r„)], then x G g{Vt)^^ . This implies that ^ C g{Q?)'^^ . 

Combining steps I and II, since e is arbitrary, dnig, g{^)) — )• in proba- 
bility. 

APPENDIX B: PROOFS FOR SECTION 3 

Throughout the proofs, (p denotes the empty set and fJ,{A) denotes the 
Lebesgue measure of set A. 
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B.l. Proof of Lemma 3.1. Recall that {n")"^ = {e:d{e,n) > e}, 
which is compact. ^9 G min^ ^"^'^^ < 0. 36* G {n")-'' such that 

EmAXfi) ■ Emj{X,e*) ^ r, it l + 

sup0e(n=)-^ mmj ^ = miiij ^ < 0. If we let 
= — sup mm > (J, 

then ye G minj ^"^^'^^ < -5 < -|, which implies that (f^^)"^ C 

Hence, P{e G < P(e G ^/sl^") = Op(an). 

B.2. Proof of Theorem 3.1. The following lemma is useful. 

Lemma B.l. With probability 1, 

(B.l) P{Z > 0) > l-p-c|>(-V^m^in|^^(^^^^^>}), 

(B.2) P(Z > 0) < cl>(v^min|^!?l^^l^^^}) . 
Proof. Let Z = (Zi, . . . , Zp)^. 

(B.l): P(Z>0) = l-p('|JZj <o') >l-^P(Zj <0) 

>l-p.c,(-V^n.in{-^^(^)^^)>}). 

7nj(g)-(yV>)j/n 



(B.2): P{Z > 0) < mmP(Zj > 0) = ^^V^minj 



□ 



Proof of Theorem 3.1.1. According to Lemma 3.1, it suffices to show 
that, w.p.a.l, for any S > 0, P{9 G AglX"^) = Op(e~""') for some a > 0. 
Define 
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Then 

P{e e AslX"^) [ p{9)L{9)de 



where 



p{e)L{e)de+ / p{e)L{e)de 



< / p{e)L{e)de+ / p{e)L{e)de, 

J As JAsnAi 
AsnA''s = i9: mm ^_ ' < -6 



3 Jv 



n 



, , fhAx^e) 
r\{e: ' >-5,i = i,...,p 



( m(x,e) 



, , fhi(x,e) 
n{e: ' >-6,i = i,...,p 

'Vii 



V 

A . 
■J' 



i=i 



Aj = i 9: ' <-6}r\ <9: ^ ' ' >-S,i = l,...,p}. 



By the weak law of large numbers, Aj — )• (j). Hence, lJ-{Aj) = for any 
j. Then n{As n i^) = f^iUjAj) < Ej K^j) = w.p.a.l. Thus, w.p.a.l, 
P{9 G As\X'^) < Const f^^p{9)L{9)d9. In addition, w.p.a.l, for some e > 
0, 

L{9) = P{Z > o)e-^^-W+i/(2n)V'^ JJ ^. 

i 

< Const > 0)ell'''ll('"Pe6® 
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< Const -^(V^mm ^Hi^ ^oj^)). 

V i Vvjj KV^JJ 



Therefore, w.p.a.l, 



P{9eAs\X'')< Const- I p{e)^( V^min "'^'^^'^-' +C'„f^^ 



< Const -^-f -5y/n + Op[ 



< Const •<!>( -^\/n 



2. Define 



O„ = <!^:min^^>0 



By Fatou's lemma, w.p.a.l, 
liminf / p{e)L{e)d9 

> / limini p{e)L{9)de> [ limmip{9)L{9) 



> / p(0) liminf ( 1 — p —"v/nmin ■ 



> I 1 -plimsup-^[— ■v/nmin "^-^^'^ + O. ' ^ 

> I ff(g)( 1 -plimsup-$( -Vramin ^^^^ \\ dO 

> inip{e)n{Ennnnn), 
n 17 n r^ri) 

= fi{E nn)- n{E n n n n^) = ^(h) - ;u(h n o n o;^) 

where 



n 
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Therefore, w.p.a.l, 

^(s^no^) <^^(^i7n 1^:^^^ <o|^ <"^^fi{e :Emj{x, e) = o) = o. 

It follows that //(S n O n Qn) > /^(S) > 0. Since p{6) is also bounded away 
from zero on i}, liminf„_>.oo P{d £ SIX") > in probability. □ 

B.3. Proof of Theorem 3.2. In Theorem 2.1, let A = int(i7), dense in 
n. Vw G int(O), 3R> such that B{uj,R) C Since 7r„ but P((9 G 
-B(a;, is bounded away from according to part 2 of Theorem 3.1, 

we have that for large n, P{9 € B{uj, R)\X'^) > 7r„. Therefore, by Theorem 
2.1, 

[F~\-Kn),F~\l - T^n)] ^ gi^) in probability. 

B.4. Proof of Theorem 3.3. To show this theorem, the following lemmas 
are useful. 

Lemma B.2. In probability, 
(B.3) limsupmaxlnp(^|X") < oo. 

Ve>0, 

(B.4) liminf inf p(6'|X") > 0. 

n-s>oo 0£n-e 

Proof. (B.3): For some e>0, 

limsupsupL(e) < TT Vjel''^ll(''^P''eell^^"^{^.e)||+£)+e ^ 
n-s>oo 6»ee 

J 



Thus, 



limsupmaxlnp(0|X"') = Const • lim sup maxlnp(0)L(0) 



< C • ln( supp(0) • limsupsupL(0) ) < oo. 



(B.4): Ve>0, 

liminf inf L(0) > Const • liminf inf p{Ze > 0)e~^^^^^<'''Po^e>\\Em(x,e)\\+e) 
>C-liminf inf P{Zg >0)>0. 

Here, C denotes a positive constant. The last inequality follows since Zg ~ 
Np{fh{e) - Vip/n, V/n), C n and Em{X, 0)>Oonn. □ 
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Lemma B.3. In probability, 

1. for all e>0, 

limsup sup \inaxlnp{uj\X"') — lnp{6\X"')\ < oo; 

2. if En -< n, then Ve > 0, 

mf,e(f,c)-.|lnp(^|X")| ^ 
Proof. 1. For each n, 

sup ImaxlnpfwIX") - lnp(6'|X")| = maxlnp(6l|X") - inf lnp(6'|X"). 

The result follows immediate from Lemma B.2. 
2. W.p.a.l, lnp(6'|X") < on hence 
inf |lnp(0|X")| 

= - sup lnp(0|X") 
>— Const -In sup L{6) 
> -C • In sup > 0) 

>-C-ln sup $ mm ■' 



As shown in the proof of Lemma 3.1, there exists some 6 > such that 
(0'^)"^ C Ag, where As = {9 :mmj '^"^■^'^^ < —6}. Thus, w.p.a.l, 



inf |lnp(g|X")| > -C- In sup <^>( V^mm "^'^^^ (^V^)jAA 

fhj{ey 



> —C ■ In sup <^ ( ^/nm.in 



> -c-in$( --^/a 



2 

> -Ci -71 + 02 Inn + Cs, 

where Ci > 0, C2 and C3 denote finite constants. This implies that 
inf,e(f,.)-.|lnKe|X-)| =Op(n). 
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□ 

Proof of Theorem 3.3. For all e > 0, since e„ — ;> oo, we have, by part 
1 of Lemma B.3, that 3N G N such that when n> N, for any 9 £ f2~^, 

maxlnp(w|X") - lnp(0|X") < e„, w.p.a.l. 

Therefore, when n> N , fi"^ C An, which implies that limsup„_j.oo supgg^ d{0, 
An) < e. 

On the other hand, let M = liminf„_!.oo max^g© lnp(0|X"). By (B.3) in 
Lemma B.2, M < oo. Moreover, by (B.4), 

M>liminf inf lnp(6'|X") > Inliminf inf p(6'|X") > -oo. 

Hence, M G M and, by the definition of M, 3Ni G N such that when n> Ni, 

maxlnp(e\X")>M-£. 
6»ee 

In addition, \/9 G {Q'^)~^ , p{9\X'^) — )• in probability. Thus, for large n, 
lnp(6l|X") < on (0^)"^ 3N2 G N such that when n > N2, 

inf |lnp(0|X")| = - sup lnp(0|X") (M-e), 
where the inequality follows by part 2 of Lemma B.3. Therefore, when n > 

iV2, 

(B.5) sup \np{9\X'')<-en + {M-e). 

However, when n > max{A^i, V6' G An = : max^ge ln^j(a;|X") — lnp(^| 
< En}, lnp(^|X"') > maX(^gelnp(a;|X") — e„ > M — e — e„. Comparing 
this with (B.5), we see that 9 ^ {Q'^)^^. In other words, d{9, Q) < e. It follows 
that 

limsup sup d{9, 0) < e. 
Since e is arbitrary, duiAn-,^) — ?■ in probability. □ 



APPENDIX C: PROOFS FOR SECTION 4 

Lemma C.l. If we suppose thatCs = {Ms,Qs), = {6* G Qs-EMs{X,9) > 
0} and Qs is compact, then, for some ^ G 0^ and normalization parameter 
C, 

(C.l) p lim p{Cs\X^) = c(\\]p{C,)P{9 G 0,|C,)e-^^^^=(^'«). 
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Proof. By the integral intermediate value theorem, the right-hand side 
of (C.l) can be written as 

(C.2) RRS = c(r\)p{Cs) [ lnM0\Cs)e-^^''^^'^''''^d9. 
On the other hand, 



piCs\X^) = C 



1 



e^xicoo)™ Y^det((27r)/ny) 

^ ^~n/2{Ms{e)-x)Tv-HMs{e)-x) 



(C.3) X ('[[xl;^e-^^''p{9\Cs)p{Cs)dedX 

= c(Y\)piCs) [ pi9\Cs)piZg>0) 

where Zg ~ Nm{Ms{6) — — ,— ). Take the difference between (C.2) and 
(C.3): 

|p(C,|X") -RHS| 
< Const -piCs) 

(C.4) 

X / p(0|C7.)|lo.p(0|C.)e-^"^*^=(^'^) 



If we let A{0) = p{e\Cs)\lnsP(.0\Cs)e-'f'^^^^^^'^^ - p{Ze > 0) x 
g-V^M4f)+i/{2n)^^Vi^|^ then (C.4) can be rewritten as 

|p(C,|X") -RHS| 

< Const -piC s) ( [ A{e)d9+ [ A{e)de+ [ A{9)de 
\Jui Ju2 Ju-i 



where 



c/3 = G e. 



EMs{X,9)>^}, 

EMs{X, 9) > 0, Einj{X, 9) = ^ for some rrij G MJ, 
for some rrij G Mg, Emj{X, 9) < 0}. 



We next look at the integrations on Ui, i = 1,2,3. 
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Ui : Note that = {0 G 9, : EM^ {X, 9) > 0} and Ze ~ iV^ (M, (6) „ , ^ 



For any e > 0, by the uniform weak law of large numbers, w.p.a.l, 
supg^u^\P{Zg > 0) — IqJ < e. Hence, for large n, w.p.a.l. 



sup I lnM9\Cs)e-^ ^"'^^^^''^ - p{Ze > 0)e-^ M4e)+i/{2n)r < 



Hence, 



A{e)de<e [ p{e\Cs)de<e. 



1/2' The Lebesgue measure of U2 = 0. 
U3: MO £ U3, le^n^ = 0, hence, 

Aie)=p{e\Cs)P{Ze>0) 

^ g-^^A74e)+i/(2n)v'^y,/. ^ 

w.p.a.l, P{Zg > 0) < e, thus, for large n, w.p.a.l. 



A{e)de<e [ p{e\Cs)e-'^'^^^'^^Ue 



U3 JU3 

< g\\i'\HsupsJEMsiX,e)\\+e)^^ 

We have thus shown that |p(Cs|X") — RHS| < Const •p(Cs)e, w.p.a.l, with 
arbitrarily small e. □ 

C.l. Proof of Theorem 4.1. 

1. The result follows immediately from Lemma C.l and Assumption 4.2. 

2. For some normalization parameter C, 

p{Cs\X^) = C 



^ ^-n/2{Ms(e)-\)Tv-^{Ms{e)-\) 

Since is positive definite and Cg = (M^, 0^) is not true, 3r > such 
that 

inf {EMs{X,9) - Xfv-^{EMs{X,e) - A) > r. 
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Hence, w.p.a.l, infe,x[o,oo)-(^4(6') - - A) > r. Therefore, 

w.p.a.l, 



y/det{2TT/nV) 

< Const •n'^e-"/2^p(C,)fTTVi^ / e'^^^^dA 

V S / ^[0,00)™ 

< e-^/^XC). 

C.2. Proof of Theorem 4.2. If we let A denote the index set correspond- 
ing to the moment inequalities that are selected by but not by C^, 
then p(CgX^) has a Y\j^/^ipj term that does not show up in p{Cg\X^). If 
p{Cl) =p{Cg), by Lemma C.l, 

«limBF - ^(^^^^'1^^') ^-"^^^-^^^^ 

Note that since M], i = 1,2, are both true, the integral intermediate value 
theorem guarantees that G hence EMl{£^i) > for i = 1,2. It follows 
that 

P{eenl\Cl) _ ||^i|i.sup,geJI-B"^(^Y,0)|l . TT J_ 
^P(^GOi|Ci) 



> 1. 



The third inequality is due to Assumption 4.6 and the last inequality follows 

from nlcn\. 



C.3. Proof of Theorem 4.3. 



Approach 1. Suppose that p{Cs) oc n°['^™(^-^=)-<i™(®-W for some a > 0. 
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1. If C* is false, then by part 2 of Theorem 4.1, 3/3 > such that 

which is exponentiahy smaU. However, there exists at least one true com- 
bination Cs, with posterior distribution bounded away from zero. Hence, 
w.p.a.l, p{Cs\X"-) > p{C*\X"-), a contradiction. 

2. Since C* is true, fi*, the identified region which is defined by it satisfies 
pi9 £ n*\C*) > 0. By Lemma C.l, j9(C*|X") = Op(n"[dim(M.)-dim(03)]^^ 
It follows immediately from the definition of C* that C* has the largest 
value of dim(Ms) — dim(0s). 

Approach 2. Suppose that p{Cs) is the uniform prior of Cg, and we put 
multivariate normal priors on unrestricted parameters. For any candidate 



/ / / L{X^\9s,\,Cs)p{9s\Cs)p{Xs\Cs 

J J JesX[0,oo)'"xRP-™ 

X p{Xl\Cs)d9s dXs dX",. 

Let 

(C.5) L{x^\es,Xs,Cs)= I L{x"'\es,x,cMK\Cs)dX'i, 

(C.6) L(X"|e„C,)= / L{X''\9s,Xs,Cs)p{Xs\Cs)dX,. 

J[0,oo)™ 

A tedious calculation shows that 
L{X'^\9si XsiCg) 



Vdet(2^5„) 



X exp { -\{MM - Xs,MmS-' (^'^lo^^' 



where 

5„ = — + 2r ; we write 5„ =n[ ^ y 

We can then calculate (C.6): 

L{X^\9s,Cs) = Const ■^=l==P{Ze > 0)e^W, 

Vdet(y2) 



where: 



^2 = V22 + Ti(7y^Ip—fYi^ with V22 being the lower diagonal block of 



Ze ~ N„,{M,{9) + S7isi^M,-(&) - iS^ V, ^); 
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. Tie) = -^M-{e)iV^2 + nallp-mYHim - ^l:^[T.1^Y.lM^M + M,(^)] + 

Given that \\V\\ = 0(1), one can show that ||Si|| = 0(1), ||S2|| = 0{-^) and 
IIS3II =0(^). 

II dll "^nal' 

Define an operator of and M(-), 

where L{X^\9s-, Cs) is the integrated Hmited information hkehhood of {9s,Cs), 
by integrating out A. We use a factor (niT,^)^'^™^^'')^^"'")/^ for rescahng so 
that 

g{n-\M{es)) = Op{l). 
Hence, w.p.a.l, without changing the orders, we have 

g{n-~\M{6,)) = [naD'^^'^^'^'^+P-^y^ L{X^\es,Cs)p{9s\Cs) 
= Const -PiZe > 0)e-W e~'^'/^^"'^"\ 

This yields that 

p(C,|X") = Const • / L{X''\es,Cs)p{es\Cs) dOs 

= (^nal)-i<^Mes)+p-m)/2 ^^^^^ . f dO^. 

The fohowing lemma is needed before proceeding. 
Lemma C.2. Under Assumption 4-6, in probability, 

lim / g{n-\Mi9s))d9s= [ lim gin-\ M{9s)) d9s. 



The proof is given at the end of this section. 

Hence, by Lemma C.2, if Cs is true, then, in probability, 

lim/ g{n-\M{9s))d9s= f lim gin- \ M {9 s)) d9s 

= Const- / e-^^'^^'^'-'U9 
= 0(1). 

The second equality is due to P[Zq > 0) — )• 1{6ig0s : -BA/3(e)>o} ''"(^) ~^ 
^'^EMs{9). The last equality follows since {6* G 6^ : EMs{9) > 0} is the iden- 
tified region of Cs and is assumed to be compact. 
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Hence, p(Cs|X") = Op((na2)-(dim(0s)+P-m)/2)^ ^Yiidi follows from the fact 
that the optimal C* that maximizes p{Cs\X'^) has the largest value of m — 
dim(e,). 

Proof of Lemma C.2. We apply the following theorem. 

Theorem C.l [Billingsley (1986), Theorem 16.8]. Letf{t,w) :{TxW)^ 
M 6e a real-valued function, absolutely integrable with respect to w. Suppose 
that: 

1. f{t,w) is continuous on a neighborhood oft = to for almost all w G W; 

2. there exists a function g:W ^ such that \f{t, w)\ < g{w) for any t £T 
and Jy^ g{w) dw < CO . 



Proof. g{ri-^,M{es)) = Consi-P{Z0>Q)e^^^^e-^'^^/^'^''^r.),Y{em,ihesimi- 
ple moments and Og are separated. If we let W denote a vector of all the 
sample moments W = {ZYi, ZY2^ ZX'^), then we can write g{n~^ , M{0s)) = 
g{n~^ ,W ,6s)- It suffices to show that 



We proceed by verifying the conditions in Theorem C.l. 

Condition 1. Note that P{Z0 > 0) -^^ Iq^ for almost all 6s, except 
on a zero-measure set {6 '.Bj, Emsj{6) = 0}. Hence, it is straightforward to 
verify that g{n~^ ,W ,6s) is continuous on a small neighborhood of {0,EW) 
for almost all 6s- 

Condition 2. In this case, the moment inequality functions are all lin- 
ear in 9, hence we can write Mg[6) = ai + Bi6 and Ms{6) = 02 + 826, where 
Bi are matrices. We can then show that, w.p.a.l (we omit some intermediate 
calculations) , 



Then 




lim f{t, w) dw. 




e-(^) < Ce 



where C > and a, (5 are constant vectors. Hence, for large n, 
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Furthermore, since Tank{EZX'^) = dim(X), there exists c > such that 
V^s G Os, we can always find a component Emsk{0) of EMs{0) such that 
Emsk{Q) < ~c||0||. Write 9s =ujr, where lo and r denote the unit direction 
vector and the radius of Og, respectively. Then Emsk{9) < —cr (here, k and r 

depend on 9, but c does not). For Z0 ~ Nm{EMs{9) + Op{^), ^), Ve > 0, 
w.p.a.l, for some > 0, 

P{Ze > 0) = P{Nm{Ms{9) + Op(n-i), SrV^) > 0) 
< PiN{rh,ki9) + Opin~^),Vk/n) > 0) 
<P{N{Emsk{9) + e,Vk/n)>0) 

(C.8) 



V n y/27r{cr - e) 

The last inequality follows from Mill's ratio inequality. We can choose e = 
then, for large n, P{Ze > 0) < Const •e~'^^^^/(^^'-'\ Combining with (C.7), we 
obtain an integrable function to upper bound g{n~^ , M{9s)): for all n, 

g{n-\M{9s)) < Const •e-'^^^e-^'^'/^^^'^-) 

= Const .e-cV{8i>fe)(r+4«fc/3^a;/c2)2g2i;fe(/3T'a;)2/c2 

< Const •e-^'/(«^'=)('^+4^^'^^"/^')'. 
To see that this upper bound function is integrable, write 9 = ruj^ so 

Qs Jo /{w: j|w|| = l} 



where S{{ljJ : \\u}\\ = 1}) denotes the surface area of the unit ball {oj : \\ljj\\ = 1}. 
□ 



Completion of the Proof of Theorem 4-3, Approach 2. It is left to show 
that the posterior of a false combination is exponentially small. Let Cg = 
{EMs,Qs) be a false combination. Then, by definition, we can write ^ = 
infg^x jQ Q^-)dim{A) \\EMs{X, 9) — X\\^ > 0. Define a compact ball B{dn) = : ||^|| < 
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dn} and Un = {0: \\9\\ > dn} for some radius dn — )■ oo, with the rate to be 
specified below. Then 

p(c,|x")« / L{x''\e,Cs)pies\Cs)des+ [ L{x''\e,Cs)p{es\Cs)des. 

JB(d„) Ju„ 
On one hand, w.p.a.l, we can show that 

< Const •e-«"/2det(S„)-V2g-"*^^(^)"^3A?4e)-n/2A?^W-s.M^^ 



e 

As>0 



< Const •e-«"/2det(S„)-V2e-«*^^W^ Af.W. 
Hence, /^^^^^^ L(X"|0, C,)p(0,|C,) dO^ < Const(nCT2)-(d™{e=)+P-™)/2e-«"/2 • 
Isiu d„) e-^'^^-^'^'^^^sMsie) ^0 jNjote that ||S3|| = 0(;^) and, in this example, 
M{9) is linear in 9, hence sup0gs(d„) ('9)^S3 ^sl^*)!! < Const ((in/cr,^)^, 

w.p.a.l. Assuming that {dn/cTnf' = Op{n), we have |n ;^ (^)^- Hence, 

[ L{X''\9,Cs)p{9s\Cs)d9s 

JB{dn) 

< (^o■2^j-(dim(e,)^-p-m)/2g-5n/2gC(d„/a„)2^dim(es) 

for some a > 0. On the other hand, 
L{X''\9,Cs)p{9s\Cs)d9s 



oc 



)/2 /■ > oy(e)^-e^e/{2n.l) 

JUr, 



We use (C.7), e^We-^^^/(2"'^") < Const •e-^^^ for some constant vector (3. 
Combining with (C.8) and using the same trick as before by writing 9 = 
ijj\\9\\, we have 

/ L{X''\9,Cs)p{9s\Cs)d9s< Const- e-""("+^)'dr • ^^'^(Q^), 

where a > 0, & G M are constant. By Mill's ratio inequality, it is less than 
^-and^ for some a > 0. □ 
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